A robust two-level incomplete factorization for 
(Navier-) Stokes saddle point matrices 



Fred W. Wubs* and Jonas Thies^ 
June 10, 2010 

Abstract 

We present a new hybrid direct /iterative approach to the solution of 
a special class of saddle point matrices arising from the discretization of 
the steady incompressible Navier-Stokes equations on an Arakawa C-grid. 
The two- level method introduced here has the following properties: (i) it is 
very robust, even close to the point where the solution becomes unstable; 
(ii) a single parameter controls fill and convergence, making the method 
straightforward to use; (iii) the convergence rate is independent of the 
number of unknowns; (iv) it can be implemented on distributed memory 
machines in a natural way; (v) the matrix on the second level has the same 
structure and numerical properties as the original problem, so the method 
can be applied recursively; (vi) the iteration takes place in the divergence- 
free space, so the method qualifies as a 'constraint preconditioner'; (vii) 
the approach can also be applied to Poisson problems. 

This work is also relevant for problems in which similar saddle point 
matrices occur, for instance when simulating electrical networks, where 
one has to satisfy Kirchhoff's conservation law for currents. 
Keywords: Saddle point problem, indefinite matrix, J^-matrix, incom- 
plete factorization, grid-independent convergence, Arakawa C-grid, in- 
compressible (Navier-) Stokes equations, constraint preconditioning, elec- 
trical networks. 

1 Introduction 

Presently, a typical computational fluid dynamics (CFD) problem may involve 
millions of unknowns. They represent velocities and pressures on a grid and 
are determined by solving a large sparse linear system of equations. Robust 
numerical methods are needed to achieve high fidelity. Therefore one often 
resorts to direct (sparse) solvers. In general such a method does not fail as long 
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as the used precision is enough to handle the posedness of the problem. However, 
there are two disadvantages to direct methods. Firstly, the amount of memory 
required for the factorization is not linear in the number of unknowns, and when 
increasing the problem size one may encounter memory limitations sooner than 
expected due to fill generated in the factors. Secondly, all the new elements 
in the factorization have to be computed, so that the computing time grows 
sharply, too. This holds especially for 3D problems, where the computational 
complexity of direct methods for partial differential equations (PDEs) grows 
with the square of the number of unknowns. 

For this reason one has to resort to iterative methods for very large ap- 
plications. Such methods perform a finite number of iterations to yield an 
approximate solution. In theory the accuracy achieved increases with the num- 
ber of iterations performed. However, iterative methods are often not robust 
for complex problems. The iteration process may stall or diverge and the fi- 
nal approximation may be inaccurate. Furthermore they often require custom 
numerics such as preconditioning techniques to be efficient. 

The hybrid direct/iterative approach presented here seeks to combine the ro- 
bustness of direct solvers with the memory and computational efficiency of itera- 
tive methods. It is based on the direct method recently developed for the Stokes 
J^-matrix by De Niet & Wubs (2009), which has the property that the fill does 
not increase in the "gradient" and "divergence" part of the matrix. To extend 
this to an incomplete factorization preconditioner one only has to drop velocity- 
velocity couplings to limit the amount of fill. We perform a non-overlapping 
domain decomposition of the grid, and eliminate the interior velocities using a 
direct method. For the remaining variables a Schur-complement problem has to 
be solved, which we do by a Krylov subspace method preconditioned by a novel 
incomplete factorization preconditioner. 

In this paper we start out by giving a survey of previous research in section[2l 
In section [3] we will describe the problem in more detail and review the direct 
method developed by De Niet & Wubs (2009). In section 2] we will introduce 
the proposed iterative procedure based on this direct method. In section [S] we 
present numerical results for a series of increasingly complex CFD problems: 
the Poisson, Darcy, Stokes and Navier-Stokes equations. 

We conclude in section [6] by summarizing the method and results and giving 
an outlook on future work. 

2 Survey of previous work 

By Benzi et al. (2005) a survey is given of methods currently in use to solve 
linear systems from fluid flow problems. In many cases saddle point problems 
can be solved efhciently by a Krylov subspace iteration (Van der Vorst 2003) 
combined with appropriate preconditioning (Benzi & Olshanskii 2006, Benzi et 
al. 2005, De Niet & Wubs 2007, Elman et al. 2002, Kay et al. 2002, Elman et 
al. 2008). Often a segregated approach is used, i.e. the velocities are solved 
independently from the pressures. This results in inner and outer iterations. 
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the former for the independent systems, and the latter to bring the solutions 
of these systems into balance with each other. We advocate a fully coupled 
approach. 

The idea of combining direct and iterative methods has been used by Henon 
& Saad (2006) and Gaidamour (2008) to solve general sparse linear systems 
arising from the discretization of scalar PDEs. As in this paper, they reduce 
the problem to a Schur-complement system on the separators of a domain de- 
composition. The Schur-complement system is solved iteratively using an ILU 
factorization. As the structural and numerical properties are not explicitly pre- 
served, robustness and grid-independence cannot be ascertained for indefinite 
problems. 

Recently, De Met & Wubs (2009) proposed a direct method for the solution 
of J-'-matrices, of which the incompressible Stokes equations on an Arakawa C- 
grid are a special case. This special purpose method reduces fill and computa- 
tion time while preserving the structure of the equations during the elimination. 
It still suffers from the weaknesses of direct methods, but only the number of 
velocity-velocity couplings increases, not the number of velocity-pressure cou- 
plings. We believe that a better understanding of the J"-matrices will lead to 
generalizations that are of interest to a broader class of indefinite problems and 
note that there are applications outside the field of fiuid mechanics, e.g. in 
electronic circuit simulations (Vavasis 1994), which lead to J^-matrices. 

For incompressible flow one has to satisfy an incompressibility constraint: 
the velocity should be divergence-free. We remark that our iterative technique 
does not violate the divergence constraint and therefore belongs to the class of 
'constraint preconditioners' (Keller et al. 2000). For details see section 1475] 

3 J^-matrices and the direct solution method 

In this paper we study the solution of the equation 

Kx = b, (1) 
where K € ji(n+m) x (n+m) > m) is a saddle point matrix that has the form 



K 



A B 




(2) 



with A e i?"^", B G i?"^™. Special attention is given to a class of saddle point 
matrices known as J-'-matrices. We start out by defining the gradient matrix in 
which the J"-matrix is expressed. 

Definition 1 A gradient-matrix has at most two nonzero entries per row and 
its row sum is zero. 

We have chosen the name gradient-matrix, because this type of matrix typically 
results from the discretization of a pressure gradient in flow equations. It is 
important to note that the definition allows a gradient-matrix to be non-square. 
Now we can define the J^-matrix. 
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Figure 1: Positioning of velocity {u,v) and pressure (p) variables in the C-grid. 

Definition 2 A saddle point matrix (0j is called an J- -matrix if A is positive 
definite and B is a gradient-matrix. 

The definition is due to Tiima (2002). 7^-matrices occur in various fluid flow 
problems where Arakawa A-grids (collocated) or C-grids (staggered, see figure[T]) 
are used. For example, in Arioli & Manzini (2003) the discretization of Darcy's 
equation in ground-water flow results in an J^-matrix. They also occur in elec- 
tronic network simulations (Vavasis 1994). 



3.1 The algorithm for the direct approach 

Many of the standard algorithms have in common that they compute a fill- 
reducing ordering for K and then somehow adapt it to make it feasible: a 
factorization is feasible if it does not break down due to a zero pivot. The 
delay of elimination (through pivoting) will give an increase in computing time 
and may lead to increased fill in the factors. To preclude this inefficiency we 
propose a different approach. Suppose the sets of all velocities and pressures 
are denoted by V and P, respectively. The respective elements will be called 
F-nodes and P-nodes. The idea is to first compute an ordering for the T^-nodes 
based on a graph that contains information of the whole matrix, and then insert 
the P-nodes appropriately. Assume that we have an elimination ordering on V, 
then we use the following simple rule to insert P-nodes into the ordering: 

Rule 1 during Gaussian elimination with K , whenever a V-node is to be elim- 
inated which is connected to a P-node, these nodes are eliminated together using 
a 2 X 2 pivot. 

With this rule we get as many 2x2 pivots as there are P-nodes. Only if 
due to elimination a V^-node becomes totally disconnected from P it can be 
eliminated on its own. 

As all P-nodes are eliminated together with a F-node in pivots of the form 
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the factorization is always feasible and additional pivoting is not required. 

If we apply this rule to an ordering on V that is constructed as a fill-reducing 
ordering for the resulting ordering for K will not be fill-reducing in general. 
To ensure that the final ordering is fill-reducing we have to use information 
about the whole matrix, i.e. the fill patterns of B and have to be taken into 
account. This is the case if the ordering for V is fill-reducing for the fill pattern 
F{A) U F{BB'^), where F{A) denotes the fiU pattern of A. This graph is an 
envelope for the fill that will be created by elimination of the nodes in P. In 
many cases this will be equal to F{A + BB'^), but to avoid possible cancellation 
in the addition we will use the matrix F{A) U F{BB"^). Summarizing we get 
the following algorithm: 

Algorithm 1 To compute a feasible fill-reducing ordering for the saddle point 
matrix K : 

1. Compute a fill-reducing ordering for the V -nodes based on F{A)UF{BB'^). 

2. Insert the P -nodes into the ordering according to rule\^ 

The P-nodes (step 2) can be inserted dynamically during Gaussian elimination, 
which means that we have to adapt the elimination process. The elimination is 
performed using the fill-reducing ordering on V and applying rule 1. This also 
takes into account that T^-nodes initially coupled to P-nodes become decoupled 
because of cancellation, which is a rather common phenomenon (see section |4^ . 
This is different from just combining pressures with velocities beforehand (static 
pivoting) . 

The above method has structure preserving properties which we list in the 
theorems below. The first two are taken from De Niet & Wubs (2009), where 
they were proved for symmetric positive definite A. Along the same lines they 
can be proved for non-symmetric positive definite A. 

Theorem 1 If K is an J- -matrix, all Schur complements K'^^^ are J- -matrices. 

This means that the A part will remain positive definite and the B part will 
have at most 2 entries per row in any step of the elimination. The latter allows 
us to keep the B part exact during the incomplete factorization. 

Theorem 2 The B part in all Schur complements is independent of the size of 
the entries in the A part. 

Theorem 3 // initially B has entries with magnitude one, then this will remain 
so during the elimination. 

Theorem A If a P-node is not eliminated together with the first V-node it is 
attached to, the next Schur complement will not be an F-matrix. 

Proof Consider the matrix in Equation [3] in the next section. It is clear that 
using only a as pivot will give a contribution in the zero block. □ 

Results of the direct method were shown with AMD (Amestoy et al. 1996) 
as fill reducing ordering in De Niet & Wubs (2009). 
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4 Structure preserving incomplete factorization 



In this section we want to develop an incomplete factorization based on the direct 
method described so far. First we will introduce the domain decomposition we 
use and then we will illustrate that simply applying a dropping strategy to the 
A part may not give the desired result when there are couplings to P-nodes. 
We then proceed to develop a combination of orthogonal transformations and 
dropping that leads to grid-independent convergence, limits fill-in and keeps the 
divergence constraint intact. 

Assumption. For this section we will assume that the entries in B have equal 
magnitude. This is not a restriction because it can be achieved by scaling the 
rows of an arbitrary gradient matrix B. If DB gives the desired matrix, our 
new matrix will be 

" DAD DB ' 

B^D O 

Observe that the post-scaling means that the F-nodes will be scaled. For Navier- 
Stokes on a stretched grid (see section 15. 4|) the scaling is such that we get as 
new unknowns the fluxes through the control cell boundaries. 

4.1 Domain decomposition 

The first step of the proposed method is to construct a non-overlapping decom- 
position of the physical domain into a number of subdomains. This can be done 
by applying a graph-partitioning method like Metis (Karypis & Kumar 1998) 
or similar libraries to F{A) U F{BB^). Metis has been tested successfully, but 
for this paper we use a manual partitioning into equally-sized square subdo- 
mains. (For the Navier-Stokes equations we used a stretched grid, so in that 
case they are not square and equally-sized in physical space but in the number 
of unknowns). 

Then we introduce a minimal overlap: two adjacent subdomains share one 
layer of velocity nodes, whereas pressure nodes are not shared among subdo- 
mains. Variables belonging to exactly one subdomain are said to be interior 
variables. Velocities connecting to interior variables in more than one subdo- 
main form separators of the subdomains they connect to. The separator veloc- 
ities are complemented by an arbitrary single P-node per subdomain. When 
eliminating the interior variables in the next step, this ensures that the sub- 
domain matrix is non-singular (in physical terms the pressure level inside the 
subdomain is fixed). We remark that 

(i) the domain decomposition can be seen as a Nested Dissection ordering as 
may be used in step 1 of Algorithm [1] stopped at a certain subdomain size 
(see also (Toselli & Widlund 2005) in the paragraph "Schur Complement 
Systems" starting on page 262); 
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(ii) we used horizontal and vertical separators as depicted for two domains in 
fig. [2j A better choice may be to use skew separators (±45°), leading to 
about half the V nodes on the separator for subdomains of similar size. 
Both approaches yield the same number of V nodes with couplings to P 
nodes in the Schur-complement, and we chose for ease of programming 
here; 

(iii) we use the decomposition primarily for numerical reasons and the number 
of subdomains will typically be much larger than the number of processors 
in a parallel computation. 

We can now eliminate the interior variables, leading to a Schur-complement 
problem for the separator velocities and remaining pressures. The remainder 
of this section is devoted to constructing an incomplete factorization precondi- 
tioner for this Schur-complement, so that it can be solved efficiently by a Krylov 
subspace method. 

4.2 The dropping problem 

Consider the following matrix, which occurs in any elimination step with a 2 x 2 
pivot: 



a 






- 


P 










a 


b 


A 


B 


h 







_ 



(3) 



When performing the elimination step, a multiple of bb^ is added to A. This 
does not introduce new fill if A is dense. But if we replaced A by a sparse matrix 
by dropping, the matrix would be filled again as b is typically dense. 

This is a common phenomenon. Consider, for example, the two-domain case 
in fig. [2l After eliminating the interior variables, many of the T^-nodes on the 
separator are coupled to the two remaining P-nodes. Assume that we drop all 
connections between the y-nodes on the separator, so in the above matrix 
A is replaced by its diagonal, and a becomes zero; 6 is a dense vector, B has an 
associated dense column with opposite sign, and 6^ has a nonzero at the same 
column position with sign opposite to that of (3. When eliminating one "T^-node 
P-node" pair, all the T^-nodes on the separator become detached from P and A 
becomes dense. 

From the above we learn that we should try to get more zeros into 6. Or 
stated otherwise, we should try to decouple the l^-nodes on the separator from 
the P-nodes as far as possible. 

4.3 Orthogonal operators to decouple V- and P-nodes 

One idea to get rid of unwanted pressure couplings is to simply drop them. How- 
ever, the fill in the P-part is already modest and an exact P-part is attractive. 
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Figure 2: Velocity separators {u, v) and pressure per domain (p) in a 2-domains case. 

as discussed in section l475l Fortunately we can do better. Consider the square 
domain decomposition (fig. [2]), extended periodically so that every subdomain 
is bounded by four separators from the neighboring subdomains. The Schur- 
complement for the separator velocities and remaining pressures has about the 
following form (the T^-nodes in the corners are neglected here, in practice they 
form 'separators of the separators' and get a block of their own): 



" An 


Bi 


A12 


Ai3 
















Bl 





B21 


B31 










Pi 






A21 


B21 


A22 





A2i 


B22 




V2 






^31 


B31 





A33 


A34 


B32 




V3 












Ai2 


Ai3 


An 


B42 




V4 












B22 


B32 


BI2 














Here vi contains the T^-nodes on a certain separator, pi contains the two P- 
nodes from the adjacent subdomains; V2 and V3 contain the V^-nodes from other 
separators around these subdomains, respectively. U4 and p2 represent the re- 
maining V- and P-nodes in the Schur-complement (separator velocities and 
pressures not connected to the separator under consideration). 

Now Bi only contains two dense columns, equal up to a sign. So by using an 
orthogonal transformation H, e.g. a Householder reflection, we can transform 
Bi into a matrix with only entries on a certain row, usually the first. Applying 
H to the first block row and column from left and right, respectively, we obtain 
the following system (note that the properties of the matrix are preserved by 
the orthogonal transformation) : 



H^AnH 


H^Bi 


H^A,2 


H^A,3 
















[H^B.Y 





B21 


B31 










Pi 




bpi 


A21H 


B21 


A22 





A24 


B22 




V2 




bv2 


A31H 


B31 





A33 


A3i 


B32 




V3 




bv3 








Ai2 


A43 




B42 




V4 




bvi 








B22 


B32 


BI2 







P2 




bp2 



The Householder matrix is a full matrix (though its application is cheap if 
its defining form is exploited) and would destroy the sparsity. However, the 



8 



matrices An, A12 and A13 are typically already dense (see remark below), so 
not much is lost and we have gained a lot: we decoupled all but one of the V- 
nodes on the separator from the P-nodes. The decoupled ones can be eliminated 
on their own now. 

Remark 1 The fill of An, A12 and A13 depends on the problem at hand. 
For the 2D Stokes-equations in the absence of the pressure terms we get two 
decoupled Poisson equations for u and v. In that case nested dissection gives 
connections between all the variables surrounding a domain. So the matrices 
Aii,Ai2, and A13 are half full (no couplings between u and v). As most pressures 
are eliminated with the interior velocities, the matrices become dense. 

Remark 2 In practice, u and v nodes on a separator may connect to the P- 
nodes with reversed signs. To ensure robustness we apply separate transforms 
to each velocity component. 

Remark 3 Choosing a Householder transformation may seem arbitrary and 
not related to the physics of the problem. We may indeed choose other orthog- 
onal transformations with the same effect (some alternatives are proposed at 
the end of section 14. 4p . The key is that one of the columns of iJ - up to a 
normalizing factor - should be the vector e with all entries equal to one. This 
yields the sum of all the fluxes through the interface, so there will be a new 
variable that represents the entire flux through the interface. The other new 
variables represent fluxes through the interface that are on average zero. 

Remark 4 Instead of scaling the vectors in H to unit length, we scale them 
to the length ni — ||e||2 of the vector e defining H. In that case the inverse of 
H is ^H^. 

m 

Remark 5 Although not necessary for the decoupling process, we also apply 
an orthogonal transformation to V^-nodes that are not coupled to a P-node in 
the first place. This is important for the dropping strategy proposed in the next 
section. 

The situation depicted in eq. |3] now only occurs once per separator and ve- 
locity component, namely for the T^-node still coupled to the P-nodes. Because 
of the transformation b is now zero, and no fill is generated. 

So far we have not made any approximations, and while we have zeroed out 
most of the F-node/P-node couplings, a dropping strategy has to be applied in 
the V-V part to get a sparse preconditioner for the Schur-complement. However, 
the Householder transformation combined with standard dropping techniques 
for the SPD case will generally not lead to grid independent convergence. This 
requires that the approximation is spectrally equivalent to the original matrix. 
We will consider a new way of dropping in the next section which has this 
property. 
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4.4 Dropping strategy 

The general idea of the approximation is the following. We replace the flux 
through grid cell faces forming a separator by the combined flux through that 
separator (see Remark 3 in the previous section). Then we try to reduce the 
problem of finding all separator velocities by dropping and elimination to the 
related problem of finding the new fluxes (or summed velocities). This reduced 
problem can still be understood in terms of conservation of mass and momentum 
and its form is very similar to the original problem. 

Let us consider an orthogonal operator that is more intuitive than the House- 
holder transformation. Suppose e is a vector with all ones and C is an orthogonal 
extension of e such that the length of every column is the same. Define a square 
matrix 

H = [C,e], 

which is orthogonal up to a constant factor (see Remark 4 in the previous 
section) . This operator is applied to the velocity component in normal direction 
on the separator. These velocities have the same sign for the connection to 
the pressure and therefore again only one row remains in H^Bi. The first 
component of H'^v will be the sum of the components of w; we will call this a 
Vg-node from now on. To develop some intuition, we first give a simple example 
of the dropping strategy which reveals that the resulting reduced problem can 
be viewed as a coarse representation of the original one. In section [4.4.2l we then 
perform a more general analysis. 

4.4.1 Example of dropping 

Consider the familiar tridiagonal matrix with elements [—1 2 — 1] on the sub-, 
main, and superdiagonal, respectively, which arises when discretizing the ID 
Laplace equation on a regular grid. We premultiply it by the block diagonal 
matrix H with diagonal blocks 

" -1 1 " 
1 1 

and postmultiply by its transpose (the same matrix here). Every pair of rows 
of the transformed matrix has the form 

1 16 1-1 
-1-10 2 1-1 

Next we make an odd-even ordering for the unknowns (equivalent to shifting 
the Vb-nodes to the end of the matrix). This new matrix has the form 

" All Ai2 ' 

A21 A22 

The matrix An is tridiagonal with entries [16 1], and A22 is tridiagonal with 
entries [—12 — 1] . So A22 is a representation of the original problem on a twice 
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(4) 



as coarse grid (up to scaling). The blocks A12 and A21 have row sum zero, a 
typical row of A12 being [10 — 1]. We just drop these two blocks and take 
the remaining part as the approximation. The fact that both An and A22 are 
principle submatrices of the above matrix infers that they both are SPD, so 
the approximation does not lead to a singular or indefinite matrix. Note that 
the elements in the dropped part are quite big and would not be dropped by 
traditional drop-by-value strategies. 

To prove grid-independent convergence when using the resulting matrix as 
preconditioner, we have to show that A2iA^^ A12 < 7^^22, for some 7 < 1 
independent of the size of the matrix (the case 7 = 1 follows directly from 
the positiveness of the Schur-complement of the original problem. For grid- 
independence we just need some extra margin). We can apply Fourier analysis in 
this constant coefficient case, which leads to the problem of finding the maximum 
of 

sin(0)2 _ cos(6'/2)2 

(6-f 2cos(6l))sin(6l/2)2 ^ 1 + cos(6i/2)2 ' 

This amounts to finding the maximum of x/ (1+2^) on [0,1], which is a monotonic 
function, so the maximum is 1/2. 

Another approach is to view the matrix as a sum of "element" matrices Ei 
and the preconditioner as a sum of Ft. Using the Rayleigh quotient, one can 
easily show that the condition number of the preconditioned matrix is bounded 
if {x,Eix)/{x,Fix) is bounded from below and above for x not in the common 
null space of Ei and Fi for all i (e.g. see (Axelsson & Larin 1997)). The singular 
vector of the transformed matrix (jlj is [0, 1, 0, 1, 0, 1, ...]-^, and for the element 
matrices of the transformed problem and the approximation we can use 





3 


1 


1 


-1 " 




' 3 





1 





E^ = 


1 


1 


1 


-1 


, F = 





1 





-1 


1 


1 


3 


-1 


1 





3 







-1 


-1 


-1 


1 







-1 





1 



Both matrices are nonnegative and the condition number of E2 Ei is bounded 
on the space orthogonal to [0,1,0,1]-^. This approach also reveals that we can 
replace An by any positive diagonal matrix and still have a condition number 
independent of the mesh size. This concludes our simple example. 



4.4.2 General analysis 



These contemplations suggest that the following lemma and its corollary play a 
key role in devising a dropping strategy: 

Lemma 1 Principal submatrices of an (S)PD-matrix are (S)PD. 
Corollary 1 



All A12 
A21 A22 



is (S)PD the 



All 
O 



O 

A22 



is (S)PD. 
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Since wc only make approximations in the A part of the matrix K, we have the 
following lemma. 

Lemma 2 If A is SPD, the condition number of the preconditioned K matrix 
is hounded by the condition number of the preconditioned A, where as precondi- 
tioner an SPD approximation of A is used. 

Proof: Consider the generalized eigenvalue problem 



A-\A 
(l-A)B^ 



(1 - X)B 
O 



Xi 
X2 



= 0, 



(5) 



where A denotes an SPD approximation of A. We see that for A 1 (A = 1 is 
clearly an eigenvalue) we can scale the border by any constant. So the eigenvalue 
problem is in fact an eigenvalue problem restricted to the kernel of the divergence 
(or constraint) operator . Suppose Q is an orthogonal basis for the kernel of 
, then we have to find the eigenvalues of the pencil {Q^AQ, Q^AQ. Now 



~ . {x,Ax) ^ . iij.CrAQy) ^ {y,Q''-AQy) 
Xrr,ir,{A,A) = mm = — < mm < 

-- Amax(A-4). 



< 



^ {x,Ax) V {y.Q^AQy) 

{y,Q'^AQy) {x,Ax) 

max ^ = < max = — 

y {y.Q^AQy) ^ {x,Ax) 



Hence, the eigenvalues of the preconditioned K are bounded by the eigenvalues 
of the preconditioned A, which leads to the result. □ 

These lemmas set the ground for further reasoning that will lead to grid- 
independent convergence. In the remainder of this section we assume that A 
is symmetric and positive definite. Let us extend H with an identity for the 
unknowns that are not transformed and write H = [Hi,H2], where 



Hi = 



C 




e 
/ 



The transformed matrix is given by 



B^AB = 



H{AHi HfAH2 
H^AHi H^AH2 



(6) 



Here H2 AH2 is a Galerkin approximation of A and hence it can be viewed 
as a discretization on a coarser grid (in fact it is an aggregation similar to 
that used by Notay (2010), albeit Notay applies the aggregation directly to 
the discretized PDE whereas we apply it to its Schur complement on the sep- 
arators). If A is obtained from a stable discretization of a second-order dif- 
ferential operator, then H^AHi has a condition number independent of the 
mesh size if the dimension of C is fixed (i.e. if the length of the separator is 
fixed). We will prove this for a very simple case using finite element theory. 
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Wc just consider the operator with p > on the interval {0,d) 

with homogeneous Neumann boundary conditions. Hence the related bilinear 
form is a{u,v) = {pu',v') = ^^pu'v'ds^ where we have used the inner product 

[u, v) = uvds. The norm associated with this inner product is denoted by 
II • ||. Here u,v and p are all functions in the Sobolev space 7^i(0,c?), which 
consists of all continuous functions that are piecewise differentiable. We will 
also apply this inner product to vectors of functions, which should be read as 
applying it element by element. 

Lemma 3 Let A = a{V, V) and M = {V, V), where V [(/)i(s), (/)2(s), ...4)m{s)] 
is a row vector of basis functions in 'Hi{0,d) with the property that there exists 
a constant c such that -^M — {V',V') is nonnegative. If C2\\u'\\^ > a{u,u) > 
ci||7/'|p, the spectral condition number of {H^ AH\)/{H'[MH\) is bounded by 

{d/hf: 

Proof: A straightforward substitution of u = VH\x in the inequality leads to 

C'iWV'Hxxf > a{VHix,VHix) = {x,H'[AHix) > ciWV'HixW^. 

Now the minimum of ||/'||/||/||, where / is an arbitrary Hi function orthogonal 
to the constant function is just the eigenfunction of the ID Laplace operator 
with homogeneous Neumann boundary conditions orthogonal to the constant, 
which is cos(7rs/rf). So 

X \\VHix\\^ f ll/IP \dJ 

Hence, the smallest eigenvalue of {H'[AHi)/{HfMHi) is bounded away from 
zero by ci(^)^. Now let us try to find an upper bound which is less than infinity. 
This maximum possible is related to the highest frequency we can build from 
VH\x such that the norm of V'Hix becomes maximal. The shortest wave 
that can be represented is related to the mesh size h. Thus we came to the 
assumption in the theorem which can be quite easily verified in a special case 
using for instance the Gershgorin circle theorem. We find that 

WV'HixW' ^ c x'^H'lMHix _ c_ 
WVHixW^ - \\VHix\\^ ~ /i2' 

so the spectral condition number of Hj'AHi/H^MHi is bounded by fp^- □ 
The constants Ci and C2 are easily determined, we can simply take the mini- 
mum and maximum of the function p{s) on the interval, respectively. Although 
this lemma is based on a second-order differential operator, one could in fact 
find a similar statement for nonnegative operators with pseudo derivative 2i/, 
where v may be any positive real number. Such an operator is found, for in- 
stance, when writing down the continuous equations at the separators, leading 
to the so-called Steklov-Poincare operator (see (Toselli & Widlund 2005)). To 
return to our discussion, for d in the lemma one could think of the length of 
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the separator. So if d decreases proportional with h when refining the grid, the 
condition number of Hj^ AHi is bounded independently of the mesh size assum- 
ing we can bound the condition number of M beforehand. The latter matrix is 
usually strictly diagonally dominant. So simply applying Gershgorin's theorem 
makes the assumption valid. 

Now assume we have the following strengthened Cauchy-Schwarz inequality 
(Axelsson 1994, section 9.1) 

\x^HlAH2y\<i{{x^HlAHix){y'^H^AH2y)Y^ (7) 

holding independently of the mesh size. In our case Hj^ H2 = 0, and if the 
columns of Hi or H2 span an invariant subspace of A, then also AH2 = 0, 
hence 7 = 0. The latter is only approximately the case here, so we will find 
some 7 < 1. Lemma 9.2 from (Axelsson 1994) states that 

AHi{H^ AHi)-^ AH2 < j'^hJAH2, (8) 

where the inequality should be understood in the sense that the sum of the 
left-hand side and the right-hand side gives a non-negative matrix. For ease of 
notation we write the transformed matrix ([6|) as 

" An A12 ' 

A21 A22 

In this notation the above property reads A2iA^^Ai2 < 7^^422. Now the pre- 
conditioner obtained by dropping ^421 and A12 is SPD according to Corollary[TJ 
The eigenvalues of the preconditioned A matrix can be found from the following 
generalized eigenvalue problem. 

" (1 - X)An -\Ar2 

-XA21 (1 - X)A22 

which leads to ((1 - A)^^22 - X'^ A2iA^l Ai2)x2 = for A ^ 1. Combined with 
the previous, we can only find eigenvalues for (1 — A)^ < (A7)^, so 1 — A7 < 
A < 1 -|- A7. So we find that the condition number of the preconditioned matrix 
is less than (1 + 7)/(l — 7), where 7 is independent of the mesh size. Using 
Lemma [2] we find the main result of this paper. 

Theorem 5 If a strengthened Schwarz-inequality ^ holds for < 7 < 1 inde- 
pendent of the mesh size, then we have convergence independent of the mesh-size 
when the dropping process as discussed above is applied. The condition number 
of the preconditioned K matrix is bounded by (1 + 7)/(l — 7). 

The situation above remains the same if wc apply the transformation to all 
separators at once. After the transformation, only the unknowns associated 
with ^22 are coupled to pressures. We may still have couplings between various 
separators in An, but the condition number of that matrix is independent of 
the mesh size. To lower the computational cost we also drop couplings between 
separators in An. We conclude this section by a number of remarks concerning 
the dropping strategy. 



Xl 
X2 



= 0, 
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Scalar equations. The reader may have noticed that in this section we hardly 
mentioned the pressure. In fact, the combination of orthogonal transformations 
and dropping may also be applied to the pure diffusion problem. In section [5] 
we will start out by showing numerical results for the scalar Poisson equation. 

The nonsymmetric case. One may ask how much of the above can be gen- 
eralized to the nonsymmetric case (for instance the Navier-Stokes equations). 
Assume the nonsymmetric matrix A is positive definite (PD), i.e. {x,Ax) > 
for any non-trivial x. Then the Schur complement is PD and the orthogonal 
transformation does not destroy that property. Since all principle submatrices 
of a PD matrix are PD, the approximation will be PD. So the factorization will 
not break down. To say something about the condition number of the precondi- 
tioned matrix is more difficult. For a mild deviation from symmetry we expect 
the same behavior as for the symmetric case. However, the numerical results for 
the Navier-Stokes equations at relatively high Reynolds-numbers indicate that 
the method works very well even for highly non-symmetric matrices. 

Numerical stability. In traditional lumping, only possible for M-matrices, 
one simply lumps a coefficient on the diagonal. This means that a nonnegative 
matrix is subtracted. Eijkhout (Eijkhout 1992) showed already in the nineties 
that this may give a zero on the diagonal. This is easy to preclude by simply not 
allowing the diagonal to become zero. What is much harder to prevent is the 
occurrence of independent systems in the preconditioner, some of which may be 
singular. This easily occurs in anisotropic problems. The proposed dropping 
does not suffer from these problems. 

Alternatives for H. Finally we propose a simple orthogonal extension to e 
in order to form H. Let m be the order of H and note that [1, —1, 0, • • • , 0]^, 
[1, 1, —2, 0, • • ■ , 0]-^, ■ • • , [1, • ■ • , 1, — (m — 1)]^, e are all orthogonal. They can be 
used for the extension after a proper scaling to the length of e. The application 
of this operator can be implemented by keeping a partial sum. In this way 
about 2m additions of rows of the matrix it is applied to are needed. The 
Householder transform has a similar operation count. One may ask whether 
alternative choices for C in Hi influence the convergence. This is not the case. 
We can replace Hi by HiQ. For arbitrary orthogonal matrices Q this has no 
infiuence on (j7l8|) and the following analysis. 

4.5 Iteration in the kernel of 

Since the fill of the B part remains at most 2 per row during the whole pro- 
cess, we will not drop there. This means that the B matrix is exact in the 
factorization, and with appropriate dropping (such as the strategy introduced 
in the previous section) , the eigenvalues of the preconditioned matrix will all be 
positive and real. Still, we cannot directly apply the preconditioned conjugate 
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gradient method since for that both original and preconditioner must be posi- 
tive definite in the Krylov subspace. We can enforce this condition by building 
the Krylov subspace IC{K~^K, x) on a starting solution x that satisfies the con- 
straint. In exact arithmetic K. then remains in the kernel of . In practice, 
accumulation of round-off errors will undermine this property. 

This problem is often encountered in the field of constraint optimization, and 
Gould et al. (2001) have developed a variant of the conjugate gradient method. 
Projected Preconditioned CG (PPCG), which can be used for the Stokes prob- 
lem. There are various ways to find a particular solution of B^v — 62, one of 
which is solving the system once, replacing K by the preconditioner. 

For the Navier-Stokes equations one could devise a Projected Preconditioned 
FOM method, as long as the eigenvalues of the preconditioned matrix are in 
the right half plane, but for the results shown in section 15.41 we simply used 
MATLAB's gmres. 

4.6 Program structure 

Before looking at numerical results, let us review the complete algorithm and 
remark on some implementation issues. The main structure of the program is 
as follows 

1. Perform a domain decomposition on F{A) U F{B)F{B)^ . We just make 
a rectangular decomposition of the domain here. 

2. Group the variables into subdomain variables and separator variables (ve- 
locities connecting to variables in more than one subdomain). All pres- 
sures are treated as subdomain variables at this stage. 

3. Group the separator variables according to variable type (i.e. u, v) and 
the subdomains they have connections to. Thus, we will get a group of 
w-velocities connecting to variables on subdomains 1 and 2, for instance. 

4. In the corners of subdomains a complete conservation cell (see fig. [T]) can 
occur on the separators. This would lead to a singularity in step [HI The 
velocities making up such a cell are flagged 'l^s '-nodes (cf. step [51). Both 
these l^s-nodes and the P-node in the cell will be retained in the Schur- 
complement. 

5. Pick for every domain a P-node to be kept in the reduction, and shift these 
to the end of the ordering (i.e. retain them in the Schur-complement). 

6. Eliminate all interior variables of the subdomains and construct the Schur 
complement system for the velocity separators and the selected pressures. 

7. Perform the transformation on each separator group identified in step [31 

8. Identify Vs nodes (separator velocities that still connect to two pressures) 
and put them at the end of the ordering, just before the remaining pressure 
nodes. 



16 



9. Drop all connections between non-Fs nodes and Ve nodes, and between 
non-Vs nodes in different separator groups. The resulting matrix is block- 
diagonal with the 'reduced Schur-coniplement' in the lower right corner. 

10. Iterate on the Schur complement using the matrix of the previous step as 
preconditioner. This preconditioner is easily applied using LU decompo- 
sitions of all non-Vs blocks and the reduced system. 

In three space dimensions, step [3] is implemented by first numbering the 
faces, then the edges and then the corners of the box-shaped subdomains. We 
note that this is a special case of the hierarchical interface decomposition (HID) 
used by Henon & Saad (2006) and Gaidamour (2008). 

4.7 Computational complexity 

We will now discuss the complexity of the algorithm, implemented as discussed 
in the previous section. We assume that a direct method with optimal com- 
plexity is used for the solution of the relevant linear systems, so in 3D if the 
number of unknowns is 0{N), the work is 0{N'^), as with Nested Dissection. 
For the 3D (Navier-) Stokes equations, we have N = 0{n^) unknowns, where 
n is the number of grid cells in one space dimension. We keep the subdomain 
size constant and denote the number of unknowns per subdomain by S* = O(s^) 
(here s is the fixed separator length). Hence, there will be N/S subdomains. 
Per domain there will be 0{s'^) non-Vs- and 0{1) T^s-nodes. Per domain the 
amount of work required is as follows; 

1. 0(5^) for the subdomain elimination; 

2. transformation on faces with H: 0{s^); 

3. factorization of non-V^E nodes: 0{{s'^)^) = OiS"^). 

The total over ah domains is 0{N/S)0{S'^) = 0{NS), so in this part the 
number of operations decreases linearly with S (e.g. by a factor 8 if s is halved). 

The solution of the reduced problem (Vs-nodes) requires 0{{N / SY) oper- 
ations. Here doubling s will decrease the work by a factor 64. So in total the 
work per iteration is 0{NS) -\- 0{{N / SY). The number of iterations is constant 
for S constant. There is, however, a positive dependence on S as we may expect. 
In the next section we will observe that the number of iterations is proportional 
to log(S'). So if we double s, a fixed amount of iterations is added. 

It is clear that if we solved the reduced problem iteratively by applying 
our method recursively until the problem has a fixed grid-independent size, the 
overall complexity would be log(S')C'(A^S'). 

5 Numerical experiments 

In this section we will demonstrate the performance of the new solver by ap- 
plying it to a series of increasingly complex problems relevant to computational 
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fluid dynamics. For each problem we first keep the subdomain size constant 
while refining the mesh. As discussed in the previous section, the complexity 
of the algorithm will then be linear in the number of unknowns except when 
solving the reduced Schur complement: the operations required to factor a sin- 
gle subdomain matrix stays the same and the number of subdomains increases 
linearly with the grid size. Furthermore, both size and connectivity pattern of 
the separators remain the same so the amount of work per separator remains 
constant while the number of separators increases linearly, too. 

The second experiment will be to fix the grid size and vary the subdomain 
size (i.e. the number of subdomains). The expectation here is that due to 
fill-in the bulk of the work load shifts from the Schur-complement towards the 
subdomain factorization as the size of the subdomains is increased. 

For each experiment, the following data is displayed: 

• Tlx - the grid size is x Hx (n^ x n^; x n^) in 2D (3D), respectively. 

• Sx - the subdomain size is Sx x {sx x s^; x s^;) in 2D (3D), respectively. 

• N - number of unknowns (size of the saddle point matrix), 

• nnz - number of nonzeros in original matrix, 

• Ns - number of unknowns on the separators and remaining p's (size of the 
Schur-complement) , 

• n - number of V^s and remaining p's (size of reduced Schur-complement), 

• iter - number of CG iterations performed on the Schur-complement to 
reduce the residual norm by 1/tol =10^, 

• fill 1 - grid-independent part of relative fill-in (number of nonzeros in 
the solver divided by number of nonzeros in original matrix). The grid- 
independent portion consists of 

— a) fill-in generated while factoring the subdomain matrices 

— b) fill-in generated while constructing the Schur-complement 

— c) fill-in generated while factoring the separator-blocks of the precon- 
ditioner 

• fill 2 - grid-dependent part of relative fill-in, generated when factoring the 
n x n-dimensional reduced Schur-complement. 

• K - condition estimate of the preconditioned Schur-complement: fraction 
of the largest and smallest eigenvalue (by magnitude) of the generalized 
eigenvalue problem Sx + XMx = 0, where S is the Schur-complement, and 
M the preconditioner used. We use approximations to the actual eigenval- 
ues computed by MATLAB's 'eigs' command (Not all tables contain this 
value). 
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Remark the fill listed under fill 1 b) can be avoided by not explicitly con- 
structing the Schur-complement. The fill listed as 'fill 2' grows with increasing 
grid size, but it can be made grid-independent by solving 5*2 iteratively, too (i.e. 
by applying our method recursively). 

We do not show plots of the convergence behavior. Since all the results 
are obtained by CG the convergence is, apart from the first few digits gained, 
completely regular, which shows that the eigenvalues, except for a few outliers 
at the beginning, appear in a cluster. The relatively stringent convergence 
tolerance of 8 digits ensures that the overall convergence behavior does not 
strongly depend on the choice of the initial vector. Choosing a smaller tolerance 
results in stagnation for some of the tests below because the conditioning of the 
matrix doesn't allow for more accurate solutions. 

The general behavior we observe in the second experiment is that the number 
of iterations scales with log(Sa;), where Sx is the separator length. So doubling 
the separator length means an increase of the number of iterations by a constant 
amount. 

5.1 The Poisson equation 

We first investigate Poisson's equation, discretized using second order central 
differences on a regular structured grid (standard 5-point and 7-point stencils 
in 2D and 3D, respectively). This is an important case as solving Poisson's 
equation is central to most CFD problems, for instance to determine the pressure 
in explicit time stepping algorithms. Tables [T] and [2] show the 2D results. The 
first shows the dependence on grid refinement and the latter the influence of the 
domain sizes. Similar results for the 3D case are shown in tables |3] and ID 
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Table 1: 2D Poisson-equation - grid refinement, subdomain size Sx — 



5.2 Darcy's law 

For flows in porous media one often has to solve the Darcy problem, where A is 
just a diagonal matrix. One approach is to eliminate the velocities, which leads 
to a Poisson equation. Care has to be taken when calculating the velocities, be- 
cause the gradient operator has to be applied to the pressure. In this numerical 
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Table 2: 2D Poisson-equation - increasing subdomain size, grid-size — 1024 
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Table 3: 3D Poisson-equation - grid refinement, subdomain size Sx = 8 
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Table 4: 3D Poisson-equation - increasing subdomain size, grid size Ux — 64. 
★ Computed at = 32. 



differentiation of the pressure field, round-ofF errors may be amplified too much 
to obtain an accurate solution. Therefore, Darcy's problem is often solved in 
primitive form. Tables [5] through [8] show the numerical results for Darcy's law 
in two and three space dimensions. 
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Table 5: 2D Darcy-cquation - grid refinement, subdomain size Sx ^ S 
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Table 6: 2D Darcy-equation - increasing subdomain size, grid size Ux — 512 
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Table 7: 3D Darcy-equation - grid refinement, subdomain size Sx = ' 
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Table 8: 3D Darcy-equation - increasing subdomain size, grid size = 40. 
-k Computed at = 32. 



5.3 A Stokes problem 

The problem is a two-dimensional Stokes equation on the unit square 

-i^Au + Vp = , \ . . 

V • u = , j 

where u{x, y) is the velocity field and p{x, y) the pressure field; the parameter u 
controls the amount of viscosity. We can get rid of the parameter v by defining 
a new pressure variable p = pjv. If the first equation is divided by we can 
substitute p by p and the parameter v is gone. So we may assume that v — \. 

These equations are discretized on a uniform staggered grid (a C-grid, see 
fig. [T]) which results in an J^- matrix. It is singular because the pressure field is 
determined up to a constant. 

For the Stokes problem the matrix if^ represents the discrete divergence 
operator. Consequently, we call the kernel of this matrix the divergence free 
space. As a solution of this problem we choose a random vector in the divergence 
free space. So the right-hand side of the divergence equation is zero in our case. 

We start off the iteration with the zero vector (which is trivially in the 
divergence free space) and therefore we can use the projected conjugate gradient 
method (see section 14. 5|) . Results are summarized in tables [9] through [121 
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Table 9: 2D Stokes-equation - grid refinement, subdomain size s^; = 8 
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Tabic 10: 2D Stokcs-equation - increasing subdomain size, grid size rix = 512 
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Table 11: 3D Stokes-equation - grid refinement, subdomain size 
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Table 12: 3D Stokes-equation - increasing subdomain size, grid size ~ 40. 
* Computed at Ux = 32. 



5.4 Incompressible flow in a lid-driven cavity 

As test problem for the Navier-Stokes equations we use the lid driven cavity. 
In (Tiesinga et al. 2002) this problem was studied near the transition point 
from steady to transient flow. The stability of steady and periodic solutions 
was investigated using the Newton-Picard method (Lust et al. 1999) with the 
0-method for time stepping (with 6 slightly larger than 0.5 in order to damp 
high-frequency modes which would otherwise show up as spurious eigenvalues 
near the imaginary axis). The linear systems that have to be solved have a 
slightly increased diagonal, which improves the conditioning somewhat. The 
MRILU preconditioner (Botta & Wubs 1999) used at the time converged slowly 
and not at a grid- independent rate. In a recent review (Elman et al. 2008), the 
performance of a number of block multi-level preconditioners is investigated for 
the steady problem for Reynolds numbers up to 1000. These methods also solve 
the coupled equations, but perform inner iterations on the velocity and pressure 
part separately and hence require many parameters to be tuned. Below we 
demonstrate robust, grid- independent convergence for the driven cavity problem 
at Reynolds-numbers of up to 8000. 

The problem consists of calculating the flow in a square cavity with uniformly 
moving lid. The domain and boundary conditions of the lid-driven cavity prob- 
lem are shown in fig. [31 where u and v denote the velocity in x- and y-direction, 
respectively. 
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u = 1 V = 

1 

u = V — 

M = w = 1 
Figure 3: Geometry for the lid-driven cavity problem. 

The equations are given by 

-u-Vu+^Au-Vp = 0,1 

V • u = O.J ^ ' 

For the discretization we use a symmetry-preserving space discretization (Verstappen 
& Veldman 2003), which is stable and does not introduce artificial diffusion. 
Furthermore, the grid is stretched towards the boundaries in order to resolve 
the boundary layers. The ratio between largest and smallest mesh size is about 
5. This also means that we really need to change to fluxes through grid cell 
boundaries instead of velocities in order to get the required property that all 
elements in B have the same magnitude (see the beginning of section U]) . The 
convergence tolerance is set to 10^^ in these experiments. The system matrix is 
the Jacobian from the first step of the Newton method at the current Reynolds 
number. In order to avoid convergence problems of Newton's method, we use 
the result at the previous Reynolds-number as a starting solution (The Reynolds 
numbers used are shown in table llSp . 

We first focus on the effect of increasing the Reynolds-number (cf. table 
I13p . The convergence is not independent of the Reynolds- number. In our view 
this is not surprising, because the underlying continuous problem changes with 
the Reynolds number and more and more eigenvalues are getting close to the 
origin. This is different from the dependence on the mesh, where the continuous 
problem stays the same and all eigenvalues near the origin stay at their place. 

Next we refine the grid at a high Reynolds- number of 8000, close to the point 
(cf. Tiesinga et at (2002)) where the steady state becomes unstable; results are 
shown in table [TH Note that the number of iterations is going down as we 
decrease the mesh-size. This is because with decreasing mesh-size the physical 
size of the subdomains is decreasing if we keep the number of unknowns per 
subdomain the same. As the physical subdomain decreases, the diffusion plays a 
more important role than the advection on that scale. Since the approximations 
take place at the subdomain scale, the convergence behavior tends to that of 
the Stokes problem. 

We conclude by mentioning that with the resulting preconditioner it was also 
quite easy to compute eigenvalues using MATLAB's eigs routine (i.e. ARPACK). 
Hence we can now study the stability problem near the point where the steady 
state becomes unstable using eigenvalue analysis. 
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Rc 


N 


nnz 


Ns 


n 


iter 


fill 1 


fill 2 


500 




785 408 




6 794 252 




129 025 




40 069 




59 




6.41 




2.59 


1000 




785 408 




6 794 252 




129 025 




40 069 




73 




6.39 




2.59 


2000 




785 408 




6 794 252 




129 025 




40 069 




87 




6.38 




2.65 


4000 




785 408 




6 794 252 




129 025 




40 069 




104 




G.35 




2.78 


8000 




785 4U8 




(i 794 252 




129 025 




40 ()()9 




13U 




G.33 




2.72 


Table 13: 2D Driven cavity 




increasing Reynolds-number, grid-size n 


nx 


N 


nnz 


Ns 


n 


iter 


fill 1 


fill 2 


64 




12 160 




103 820 




1 793 




533 




185 




6.09 




0.418 


128 




48 896 




420 620 




7 681 




2 341 




181 




6.22 




0.953 


256 




196 096 




1 693 196 




31 745 




9 797 




167 




6.29 




1.75 


512 




785 408 




6 794 252 




129 025 




40 069 




130 




6.33 




2.72 



Table 14: 2D Driven cavity - grid refinement at Re = 8000 



6 Discussion and conclusions 

In this paper we have shown that the structure preserving complete LDLF fac- 
torization introduced in De Niet & Wubs (2009) of an ^-matrix can be trans- 
formed into an incomplete factorization. Wc constructed an iterative solver for 
the whole system, which avoids having to balance inner and outer iterations as 
in a segregated approach. Depending only on a single parameter (the subdo- 
main size), the method is as easy to use as a direct solver and gives reliable 
results in a reasonable turn-around time. 

For Stokes matrices we were able to prove grid-independent convergence. 
The total number of operations required is currently not grid-independent since 
we use a direct solver to solve the reduced system. However, the amount of 
work required for this step is reduced by about the cube of the subdomain size 
in 2D and the sixth power in 3D. So increasing the subdomain size by a factor 2 
means in 2D a factor 8 and in 3D a factor 64. For the Navier- Stokes equations 
we also observed grid-independent convergence. We are developing a parallel 
C-l— I- implementation of the method that can be applied recursively, making it 
a multi-level method. 

We proved the robustness of the method for Stokes and Navier-Stokes equa- 
tions, where in the latter case the matrix should be definite. Computations 
show that the method still performs well for cases where eigenvalues pass the 
imaginary axis away from the origin (Hopf bifurcations). 

In the case of J^-matrices we are able to keep the computation in the kernel 
of the constraint equation, i.e. for Stokes in the divergence free space, allowing 
us to use the CG method. Though the ^-matrices seem to be a limited class 
due to the constraints on the sparsity pattern in B, many applications lead to 
matrices of this type. 
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